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1 Cluster ensembles — a knowledge reuse framework for c ombining multi p le part it ions Q 



Alexander Strehl, Joydeep Ghosh 

March 2003 The Journal of Machine Learning Research, volume 3 
Publisher: MIT Press 

Additional Information: full cit at ion, abst ra ct, references, citings, in de x 
terms 



Full text available: g pdf(842.50 KB) 



This paper introduces the problem of combining multiple partitionings of a set of objects 
into a single consolidated clustering without accessing the features or algorithms that 
determined these partitionings. We first identify several application scenarios for the 
resultant 'knowledge reuse' framework that we call cluster ensembles. The cluster 
ensemble problem is then formalized as a combinatorial optimization problem in terms of 
shared mutual information. In addition to a direct ... 

Keywords: cluster analysis, clustering, consensus functions, ensemble, knowledge reuse, 
multi-learner systems, mutual information, partitioning, unsupervised learning 



2 Session 8 (tuesday, june 6th--3: 15-4:30 pm): Optimal succinct representations of 
<H> planar maps 

^ Luca Castelli Aleardi, Olivier Devillers, Gilles Schaeffer 

June 2006 Proceedings of the twenty-second annual symposium on Computational 

geometry SCG '06 
Publisher: ACM Press 

Full text available: Q pdf(271.32 KB) Additional Information: full citation, abstra ct, ref erences , ind ex te r m s . 

This paper addresses the problem of representing the connectivity information of 
geometric objects using as little memory as possible. As opposed to raw compression 
issues, the focus is here on designing data structures that preserve the possibility of 
answering incidence queries in constant time. We propose in particular the first optimal 
representations for 3-connected planar graphs and triangulations, which are the most 
standard classes of graphs underlying meshes with spherical topology. Opt ... 

Keywords: compression, geometric data structures, graph encoding, mesh, planar maps, 
succinct data structures 
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Tilman Lange, Joachim M. Buhmann 

August 2005 Proceeding of the eleventh ACM SIGKDD international conference on 
Knowledge discovery in data mining KDD '05 

Publisher: ACM Press 

Full text available' Ddf(252 35 KB) Additional Information: full citation , abstract , references , citings , index 
•Ta p * . terms 

Data clustering represents an important tool in exploratory data analysis. The lack of 
objective criteria render model selection as well as the identification of robust solutions 
particularly difficult. The use of a stability assessment and the combination of multiple 
clustering solutions represents an important ingredient to achieve the goal of finding 
useful partitions. In this work, we propose a novel way of combining multiple clustering 
solutions for both, hard and soft partitions: the appro ... 

Keywords: clustering, consensus partition, re-sampling 



4 Session 15A: 2-source d ispersers for sub-polynomial entropy and R a mse y graphs 
beating the Fr ankl-W ils on co nstruction 
Boaz Barak, Anup Rao, Ronen Shaltiel, Avi Wigderson 

May 2006 Proceedings of the thirty-eighth annual ACM symposium on Theory of 
computing STOC '06 

Publisher: ACM Press 

Full text available- odf(21 3 28 KB) Additional Information: full citation, a bstrac t, references, citings, index 
" ^ terms 

The main result of this paper is an explicit disperser for two independent sources on n 
bits, each of entropy k=n 0(1) . Put differently, setting N=2 n and K=2\ we construct explicit 
N x N Boolean matrices for which no K x K submatrix is monochromatic. Viewed as 
adjacency matrices of bipartite graphs, this gives an explicit construction of K-Ramsey 
bipartite graphs of size N.This greatly improves the previous bound of k=o(n) of Barak, 
Kindler, Shaltiel, Suda ... 

Keywords: Ramsey graphs, dispersers, extractors, independent sources 



Est im at ing point-to-point and point-t o -multipoint traffi c matrices: an information - 
theoretic approach 

Yin Zhang, Matthew Roughan, Carsten Lund, David L Donoho 

October 2005 IEEE/ACM Transactions on Networking (TON), Volume 13 issue 5 

Publisher: IEEE Press 

Full text available: ^ pdf(686.66 KB) Additional Information: full citation , abstract , references , index terms 

Traffic matrices are required inputs for many IP network management tasks, such as 
capacity planning, traffic engineering, and network reliability analysis. However, it is 
difficult to measure these matrices directly in large operational IP networks, so there has 
been recent interest in inferring traffic matrices from link measurements and other more 
easily measured data. Typically, this inference problem is ill-posed, as it involves 
significantly more unknowns than data. Experience in many scie ... 

Keywords: SNMP, failure analysis, information theory, minimum mutual information, 
point-to-multipoint, point-to-point, regularization, traffic engineering, traffic matrix 
estimation 



Traffic engineering: An information-theoretic approach to traffic matrix estimation 
Yin Zhang, Matthew Roughan, Carsten Lund, David Donoho 
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August 2003 Proceedings of the 2003 conference on Applications, technologies, 

architectures, and protocols for computer communications SIGCOMM '03 

Publisher: ACM Press 

Full text available* 15J pdf(421 04 KB) Additional Information: full citation, abs tract, referenc es, citings, index 
' ^ " ' terms 

Traffic matrices are required inputs for many IP network management tasks: for instance, 
capacity planning, traffic engineering and network reliability analysis. However, it is 
difficult to measure these matrices directly, and so there has been recent interest in 
inferring traffic matrices from link measurements and other more easily measured data. 
Typically, this inference problem is ill-posed, as it involves significantly more unknowns 
than data. Experience in many scientific and engineering f ... 

Keywords: SNMP, information theory, minimum, mutual information, regularization, 
traffic engineering, traffic matrix estimation 



Multi Relat io nal Data Mining (M R DM): State of the art of graph-based data m ining 
Takashi Washio, Hiroshi Motoda 

July 2003 ACM SIGKDD Explorations Newsletter, volume 5 issue l 
Publisher: ACM Press 

Full text available: ^| pdf(1.20 MB) Additional Information: full citation , abstract , references , citings 

The need for mining structured data has increased in the past few years. One of the best 
studied data structures in computer science and discrete mathematics are graphs. It can 
therefore be no surprise that graph based data mining has become quite popular in the 
last few years, This article introduces the theoretical basis of graph based data mining and 
surveys the state of the art of graph-based data mining. Brief descriptions of some 
representative approaches are provided as well. 

Keywords: data mining, graph, graph-based data mining, path, structured data, tree 



8 Paper session KM-3 (knowled g e mana g ement): classification & clustering: Clustering Q 

high-dimensional d a ta usi n g an efficient and effective data space reducti on 
Ratko Orlandic, Ying Lai, Wai Gen Yee 

October 2005 Proceedings of the 14th ACM international conference on Information 
and knowledge management CIKM '05 

Publisher: ACM Press 

Full text available: ^jjj] pdf(204.82 KB ) Additional Information: full citation, a bstra ct, references, index te rms 

This paper introduces a new algorithm for clustering data in high-dimensional feature 
spaces, called GARDEN WD . The algorithm is organized around the notion of data space 

reduction, i.e. the process of detecting dense areas (dense cells) in the space. It performs 
effective and efficient elimination of empty areas that characterize typical high- 
dimensional spaces and an efficient adjacency-connected agglomeration of dense cells 
into larger clusters. It produces a compact represen ... 

Keywords: data clustering, data dimensionality, data mining, space partitioning 




9 Content 2: image clustering: Iteratively clustering web images based on link and 
£> attribute reinforcements 

Xin-Jing Wang, Wei-Ying Ma, Lei Zhang, Xing Li 

November 2005 Proceedings of the 13th annual ACM international conference on 
Multimedia MULTIMEDIA '05 

Publisher: ACM Press 
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Full text available: ^ pdf( 248.02 KB) Additional Information: full citation, ab strac t, referen ces , inde x terms 

Image clustering is an important research topic which contributes to a wide range of 
applications. Traditional image clustering approaches are based on image content features 
only, while content features alone can hardly describe the semantics of the images. In the 
context of Web, images are no longer assumed homogeneous and "flatdistributed but are 
richly structured. There are two kinds of reinforcements embedded in such data: 1) the 
reinforcement between attributes of different data types (int ... 

Keywords: image clustering, iterative reinforcement, link mining 



10 Distribut i onal Scaling: An Algorithm for Structure-Preserving Embedding of Metric 
and N onmetnc Spaces 

Michael Quist, Golan Yona 

December 2004 The Journal of Machine Learning Research, volume 5 
Publisher: MIT Press 

Full text available: ^| pdf(508.39 KB) Additional Information: full citation , abstract , references , index terms 

We present a novel approach for embedding general metric and nonmetric spaces into 
low-dimensional Euclidean spaces. As opposed to traditional multidimensional scaling 
techniques, which minimize the distortion of pairwise distances, our embedding algorithm 
seeks a low-dimensional representation of the data that preserves the structure 
(geometry) of the original data. The algorithm uses a hybrid criterion function that 
combines the pairwise distortion with what we call the geometric distortion. T ... 

11 Session P12: mesh e s: Efficient comp ress ion a nd re nd erin g of multi-re so lution 
meshes 

Zachi Kami, Alexander Bogomjakov, Craig Gotsman 

October 2002 Proceedings of the conference on Visualization '02 VIS '02 
Publisher: IEEE Computer Society 

Full text available: ^O|pdf(3.02 M B ) Additional Information: full citation, abstract, references, citi ngs 

We present a method to code the multiresolution structure of a 3D triangle mesh in a 
manner that allows progressive decoding and efficient rendering at a client machine. The 
code is based on a special ordering of the mesh vertices which has good locality and 
continuity properties, inducing a natural multiresolution structure. This ordering also 
incorporates information allowing efficient rendering of the mesh at all resolutions using 
the contemporary vertex buffer mechanism. The performance of o ... 

Keywords: geometry coding, progressive compression, rendering, wavelets 



12 Clustering gene expr e ssion patterns 
Amir Ben-Dor, Zohar Yakhini 

April 1999 Proceedings of the third annual international conference on 
Computational molecular biology RECOMB '99 

Publisher: ACM Press 

Full text available: ffj pdf(1.06 MB) Additional Information: full citation, references, citi n gs, index terms 




13 Manifolds and modeling: Surface modeli n g and parameterization with manifolds 




Cindy Grimm, Denis Zorin 

July 2005 ACM SIGGRAPH 2005 Courses SIGGRAPH '05 
Publisher: ACM Press 
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14 Surfac e modelin g a nd par a met e riz a tion with manifol d s: S u rf ace m odel i n g and 
^ parameteriza tion w ith manifolds: S iggraph 2006 course notes 

^ Author presenation vid e os a re available from the citation page 

Cindy Grimm, Denis Zorin 

July 2006 ACM SIGGRAPH 2006 Courses SIGGRAPH 06 

Publisher: ACM Press 

Full text available: g pd ff 1 7. 8 5 MB) 

Q rnov(251.00 Additional Information: full citation, abstract, referenc es 

bytes) 

Many diverse applications in different areas of computer graphics, including geometric 
modeling, rendering and animation, require dealing with sets which cannot be easily 
represented with a single function on a simple domain in a Euclidean space: Examples 
include surfaces of nontrivial topology, environment maps, reflection/transmission 
functions, light fields, configuration spaces of animation skeletons, and others. In most 
cases these objects are described as collections of functions defined o ... 
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A method for concise, faithful approximation of complex 3D datasets is key to reducing 
the computational cost of graphics applications. Despite numerous applications ranging 
from geometry compression to reverse engineering, efficiently capturing the geometry of 
a surface remains a tedious task. In this paper, we present both theoretical and practical 
contributions that result in a novel and versatile framework for geometric approximation 
of surfaces. We depart from the usual strategy by casting ... 

Keywords: Lloyd's clustering algorithm, anisotropic remeshing, geometric approximation, 
geometric error metrics, surfaces 
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Interactive spoken dialogue provides many new challenges for natural language 
understanding systems. One of the most critical challenges is simply determining the 
speaker's intended utterances: both segmenting a speaker's turn into utterances and 
determining the intended words in each utterance. Even assuming perfect word 
recognition, the latter problem is complicated by the occurrence of speech repairs, which 
occur where speakers go back and change (or repeat) something they just said, the 
word ... 
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The development of effective content-based multimedia search systems is an important 
research issue due to the growing amount of digital audio-visual information. In the case 
of images and video, the growth of digital data'has been observed since the introduction 
of 2D capture devices. A similar development is expected for 3D data as acquisition and 
dissemination technology of 3D models is constantly improving. 3D objects are becoming 
an important type of multimedia data with many promising appl ... 
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The technology underlying text search engines has advanced dramatically in the past 
decade. The development of a family of new index representations has led to a wide 
range of innovations in index storage, index construction, and query evaluation. While 
some of these developments have been consolidated in textbooks, many specific 
techniques are not widely known or the textbook descriptions are out of date. In this 
tutorial, we introduce the key techniques in the area, describing both a core impl ... 

Keywords: Inverted file indexing, Web search engine, document database, information 
retrieval, text retrieval 
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Both document clustering and word clustering are well studied problems. Most existing 
algorithms cluster documents and words separately but not simultaneously. In this paper 
we present the novel idea of modeling the document collection as a bipartite graph 
between documents and words, using which the simultaneous clustering problem can be 
posed as a bipartite graph partitioning problem. To solve the partitioning problem, we use 
a new spectral co-clustering algorithm that uses the second left and ... 
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